Times Series Analysis Lab Session 1

CABAL Paul-Louis

2022-10-06

The goal of this lab session is to manipulate data, times series and to analyse them with ACF.

library ( "TSstudio" )
#> Warning: le package 'TSstudio' a été compilé avec la version R 4.1.3
data ( USgas )

Studying Gaussian withe noise and random walk

Gaussian white noise

Gaussian white noise (GWN) is a stationary and ergodic random process with zero mean that is defined by the following fundamental property: any two values of GWN are statis- tically independent now matter how close they are in time. The direct implication of this property is that the autocorrelation function of a GWN.

First we generate 1000 observations from a normal distribution with 0 mean and a standard deviation of 1.7. This process is named (ut).

n = 1000
ut = rnorm(n,0,1.7)

Then we plot the generated series :

tseries1 = ts(ut)
ts_plot(tseries1)

Now thanks to the autocorellation fucntion which defines how data points in a time series are related. We can now study the ACF of this Gaussian White Noise

y = acf(tseries1)

y
#> 
#> Autocorrelations of series 'tseries1', by lag
#> 
#>      0      1      2      3      4      5      6      7      8      9     10 
#>  1.000 -0.030  0.024 -0.059 -0.038 -0.023  0.004 -0.010 -0.034 -0.006 -0.005 
#>     11     12     13     14     15     16     17     18     19     20     21 
#> -0.009  0.027 -0.030  0.003  0.051  0.024 -0.003 -0.053  0.000 -0.044 -0.001 
#>     22     23     24     25     26     27     28     29     30 
#>  0.026 -0.042 -0.030 -0.002  0.002 -0.009  0.013 -0.006 -0.017

The ACF reflect nicely the definition of a white noise because at Order 0, white noise is equal to 1 and other are not significatives (meaningless, so can be approximated as 0). Then this ACF illustrate GWN perfectly

Random Walk

Now let’s run the same analysis using a random walk process. Formally speaking, yt :

yt = yt−1 + ut

A random walk can be seen as a cumulative sum of Gaussian white noises increments. Let’s generate the series :

set.seed(123)
for (t in 2:1000) ut[t] = ut[t-1] + ut[t]
plot(ut, type="l")

Now we plot the ACF and analyse it :

y = acf(ut)

y
#> 
#> Autocorrelations of series 'ut', by lag
#> 
#>     0     1     2     3     4     5     6     7     8     9    10    11    12 
#> 1.000 0.989 0.978 0.967 0.957 0.948 0.939 0.930 0.922 0.914 0.906 0.898 0.891 
#>    13    14    15    16    17    18    19    20    21    22    23    24    25 
#> 0.883 0.876 0.868 0.860 0.851 0.842 0.834 0.826 0.818 0.811 0.804 0.797 0.791 
#>    26    27    28    29    30 
#> 0.785 0.779 0.773 0.767 0.761

This model is hard to use because every prices or data are correlated with the precedent one and if we reduce the time for t=10 for example the acf will look to the one of a Gaussian white noise


set.seed(1230)
for (t in 2:10) ut[t] = ut[t-1] + ut[t]
plot(ut)

par(mar = rep(2, 4))
par(mfrow = c(2,1))
Box.test(ut, lag = 1000000,type = 'Ljung')
#> 
#>  Box-Ljung test
#> 
#> data:  ut
#> X-squared = NA, df = 1e+06, p-value = NA

acf(ut, lag = 100) 
acf(tseries1, lag = 100)

Conclusion

Bigger the lag is the more the ACF of a random walk will look similar to the one of a Gaussian white noise.

A random walk is integrated white noise if we look to the difference of 1st order apply to the random walk we will see that is equal to white noise

Importing, plotting and analyzing data

Let’s import the database contained in the cvs file using the read.csv function. This database is a collection of stock prices.

data = read.csv("C:\\Users\\paull\\Documents\\Econometric\\data.csv", header = TRUE,sep=",")

dates = as.Date(data$X, format = "%m/%d/%Y")
str(dates)
#>  Date[1:4623], format: "2004-12-31" "2005-01-03" "2005-01-04" "2005-01-05" "2005-01-06" ...

We plot the third and fifth columns of this dataframe.

plot(dates,data$F.LVMH,type="l",col = "green")
lines(dates,data$S.ROG,type="l",col = "black")

Plot the density of the selected stock daily returns using a quantile plot we will check the normality of the empirical distribution of the daily returns.

LVMH = data$F.LVMH
ROG = data$S.ROG

LVMH.dr = matrix(0,length(LVMH))
for ( i in 2:length(LVMH)){
  LVMH.dr[i-1] = LVMH[i]/LVMH[i-1]-1
}

ROG.dr = matrix(0,length(ROG))
for ( i in 2:length(ROG)){
  ROG.dr[i-1] = ROG[i]/ROG[i-1]-1
}

LVMH.dr.density = density(LVMH.dr)
ROG.dr.density = density(ROG.dr)

par(mar = rep(2, 4))
par(mfrow = c(2,2))
plot(LVMH.dr.density)
plot(ROG.dr.density)

qqnorm(LVMH.dr)
qqnorm(ROG.dr)

Let’s calculate the autocovariance and the autocorrelation functions of the daily returns and calculate the Ljung-Box text for the various numbers of lags

par(mar = rep(2, 4))
par(mfrow = c(2,1))
Box.test(LVMH.dr, lag = 100,type = 'Ljung')
#> 
#>  Box-Ljung test
#> 
#> data:  LVMH.dr
#> X-squared = 191.4, df = 100, p-value = 1.05e-07
acf(LVMH.dr)
acf(ROG.dr)

Box.test(LVMH.dr, lag = 20,type = 'Ljung')
#> 
#>  Box-Ljung test
#> 
#> data:  LVMH.dr
#> X-squared = 41.363, df = 20, p-value = 0.003345

We observe the the smallest the lag is the more the p-value is bigger but the smaler X-squared value is Plot the same ACF of the daily squared returns and calculate the Ljung-Box text for the various numbers of lags

LVMH.dr.squared=matrix(0,length(LVMH))
for (i in 2:length(LVMH)){
  LVMH.dr.squared[i-1]=(LVMH[i]/LVMH[i-1]-1)^2
}

ROG.dr.squared=matrix(0,length(ROG))
for (i in 2:length(ROG)){
  ROG.dr.squared[i-1]=(ROG[i]/ROG[i-1]-1)^2
}
par(mfrow=c(2,2))
par(mar = rep(2, 4))
acf(LVMH.dr.squared)
acf(ROG.dr.squared)
Box.test(LVMH.dr.squared,lag=100, type="Ljung")
#> 
#>  Box-Ljung test
#> 
#> data:  LVMH.dr.squared
#> X-squared = 3239.5, df = 100, p-value < 2.2e-16
Box.test(LVMH.dr.squared,lag=20, type="Ljung")
#> 
#>  Box-Ljung test
#> 
#> data:  LVMH.dr.squared
#> X-squared = 1475.6, df = 20, p-value < 2.2e-16
Box.test(ROG.dr.squared,lag=20, type="Ljung")
#> 
#>  Box-Ljung test
#> 
#> data:  ROG.dr.squared
#> X-squared = 1381.2, df = 20, p-value < 2.2e-16

With squared values, the p-value is smaller and seems to be constant whether the lag is. But the higher the lag is the higher the X-squared value will be.

Conclusion

To conclude, the p-value is really small so we can reject the Hypothesis that the times series is not stationary. Finally the time series is STATIONARY

The Maximum likelihood estimator and functions

The purpose of this exercise is now to write a function that calculate :

■ The likelihood of a sample generated by a exponential distribution

■ And find the parameter θ underlying to this sample.


set=rexp(n=1000,rate=3)
likelihood=function(set,rate){
  n=length(set)
  return(rate^n*exp(-rate*sum(set)))
}
likelihood(set,3)
#> [1] NaN

teta=function(set){
  n=length(set)
  return(n/sum(set))
}
teta(set)
#> [1] 3.118225

Let’s try with a sample generated by a Gaussian distribution

set = rnorm(n=1000)

teta=function(set){
  n=length(set)
  return(n/sum(set))
}
teta(set)
#> [1] -29.24364